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Abstract. 

An amended MaxEnt formulation for systems displaced from the conventional MaxEnt 
equilibrium is proposed. This formulation involves the minimization of the Kullback-Leibler 
divergence to a reference Q (or maximization of Shannon Q-entropy), subject to a constraint that 
implicates a second reference distribution Pi and tunes the new equilibrium. In this setting, the 
equilibrium distribution is the generalized escort distribution associated to Pi and Q. The account 
of an additional constraint, an observable given by a statistical mean, leads to the maximization 
of Renyi/Tsallis Q-entropy subject to that constraint. Two natural scenarii for this observation 
constraint are considered, and the classical and generalized constraint of nonextensive statistics 
are recovered. The solutions to the maximization of Renyi Q-entropy subject to the two types 
of constraints are derived. These optimum distributions, that are Levy-like distributions, are self- 
referential. We then propose two 'alternate' (but effectively computable) dual functions, whose 
maximizations enable to identify the optimum parameters. Finally, a duality between solutions and 
the underlying Legendre structure are presented. 
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INTRODUCTION 

The formalism of nonextensive statistical mechanics [1, 2] leads to a generalized 
Boltzmann factor in the form of a Tsallis distribution (or factor) that depends on an 
entropic index and recovers the classical Boltzmann factor as a special limit case [1]. 
This distribution is of high interest in many physical systems since it enables to model 
power-law phenomena. In a wide variety of fields, experiments, numerical results and 
analytical derivations fairly agree with the description by a Tsallis distribution. 

Tsallis' distributions (sometimes called Levy distributions) are derived by 
maximization of Tsallis entropy [3], under suitable constraints. The present formulation 
is as follows: maximize Tsallis' entropy 



T a (P) = 

1 — a 



P(x) a dx-1 



(1) 
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subject to 

m = J xP*(x)dx with P*(x) = jp^^ (2) 

where the mean constraint is called a 'generalized' mean constraint in the nonextensive 
litterature, and P*(x) is called the 'escort' distribution. This formulation was preferred 
to the simple maximization with a classical mean constraint m = J xP(x)dx because of 
mathematical difficulties. The solution is given in the litterature as 



z V z 1 



P(x) = -[ 1- v ^ {x-m) ) , (3) 



i 

l-a 



where Z is a partition function. 

Of course, these distributions do not coincide with those derived by conventionnal 
MaxEnt and consequently will not be justified from a probabilistic point of view, because 
of the uniqueness of the rate function in the large deviations theory [4, 5]. Furthermore, 
the status and interest of generalized expectations and of escort distributions is unclear. 
Last, it is apparent that the expression of distribution (3) is implicit, so that both its 
manipulation and determination of its parameter (3 will be difficult. 

However, in view of the success of nonextensive statistics, there should exist a 
probabilistic setting that provides a justification for the maximization of Tsallis entropy. 
There are now several indications that results of nonextensive statistics are physically 
relevant for partially equilibrated or nonequilibrated systems, with a stationary state 
characterized by fluctuations of an intensive parameter [6, 7]; for instance, the Tsallis 
factor is obtained from the Boltzmann-Gibbs' if the inverse of temperature fluctuates 
according to a gamma distribution. 

In this paper, I present a framework for the maximization of Renyi/Tsallis Q— entropy, 
that leads to the so-called Levy distribution (or Tsallis factor). The Renyi information 
divergence, the opposite of Renyi Q-entropy, is given by 

D a (P\\Q) = J—]ogf P(x) a Q(xy- a dx, (4) 
a-1 J 

where a is a real parameter called the entropic index. Using L'Hospital's rule, the 
Kullback-Leibler divergence is recovered for a — > 1 

D(P\\Q) = J P( x )\og^dx. (5) 

Its opposite is the Shannon Q— entropy, the correct, coordinate invariant, extension of the 
classical Shannon entropy to the continuous case [8]. This divergence can be interpreted 
as a "distance" between two distributions. Renyi and Tsallis Q-entropies are related by 
a simple monotonic function. Therefore, their maximization under the same constraint 
lead to the same distribution. 

In the following, I propose an amended MaxEnt formulation for systems with a 
displaced equilibrium, find that the relevant entropy in this setting is the Renyi entropy, 



interpret the mean constraints, derive the correct form of solutions, propose numerical 
procedures for estimating the parameters of the Tsallis factor and characterize the 
associated entropies. I will also indicate a duality between the solutions associated with 
classical and generalized mean constraint. Finally I will discuss the underlying Legendre 
structure of generalized thermodynamics associated to this setting. 



THE AMENDED MAXENT FORMULATION 

A key for the apparition of Levy distributions and a probabilistic justification might 
be that it seems to appear in the case of modified, perturbated, or displaced classical 
Boltzmann-Gibbs equilibrium. This means that the original MaxEnt formulation "find 
the closest distribution to a reference under a mean constraint" may be amended by 
introducing for instance a new constraint that displaces the equilibrium. The partial 
or displaced equilibrium may be imagined as an equilibrium characterized by two 
references, say Pi and Q. Instead of selecting the nearest distribution to a reference under 
a mean constraint, we may look for a distribution P* simultaneously close to two distinct 
references: such a distribution will be localized somewhere 'between' the two references 
Pi and Q. For instance, we may consider a global system composed of two subsystems 
characterized by two prior reference distributions. The global equilibrium is attained for 
some intermediate distribution, and the observable may be, depending on the viewpoint 
or on the experiment, either the mean under the distribution of the global system or 
under the distribution of one subsystem. This can model a fragmentation process: a 
system H(A,B) fragments into A, with distribution Pi, and B with distribution Q, and 
the whole system is viewed with distribution P* that is some intermediate between P 1 
and Q. This can also model a phase transition: a system leaves a state Q toward Pi and 
presents an intermediate distribution P*. 

This can be stated as: find P* such that the Kullback-Leibler divergence to Q, 
D{P\\Q) is minimum (or equivalently the Shannon Q -entropy is maximum), but under 
the constraint that D(P\\Q) = D(P\\Px) + 9, where 9 can be expressed as a log- 
likelihood. The problem simply writes 

= min P f P(x) los^ldx 



minp D(P\\Q) = mhip f P(x)\og^Mldx 
s.t 9 = D(P\\Q) - P>(P||Pi) = JP{x)]og%$dx 



and its solution was given by Kullback [9, page 39] as an illustration of his general 
theorem on constrained minimization of D(P\\Q) : 

p*(x) = ^I9^tl m 

{ ' J P^xYQixf-^dx' K ) 

which is nothing else but the escort distribution (2) of nonextensive statistics [10] 
(although it is generalized here with reference Q). The parameter a is simply the 
Lagrange parameter associated to the constraint, and it can be shown that necessarily 
a < 1. Clearly, distribution P* which is the geometric mean between Pi and Q realizes 



a trade-off, governed by a, between the two references. By dual attainment, we have 

{ s.t e^D{ P m)- Q D(P\\ Pi ) =sup( a »-log(/p 1 W"Q W 1 -"^)). (8) 

In this last relation, the term log(J P 1 (x) a Q(x) 1 ~ a dx) is directly proportional to the 
Renyi divergence (4). 



Observable mean values 

Observable values are as usual the statistical mean under some distributions. 
Depending on the viewpoint, the observable may be a mean under distribution Pi, 
the distribution of an isolated subsystem, or under P*, the equilibrium distribution 
between P and Q. Hence, the problem will be completed by an additionnal constraint, 
and a possible approach would be to select distribution P 1 by further minimizing the 
Kullback-Leibler information divergence D(P\\Q), but over P\(x) and subject to the 
mean constraint. So, the whole problem writes 

f . f minp D(P\\Q) = mm P fP(x) log ggete 
K = I mmPl \ subject to: 6 = J P(x) log ^dx , (9) 
[ subject to: m = E Pl [X] or m = E P * [X] 

where Ep[X] represents the statistical mean under distribution P : E P [X] = J xP(x)dx. 
This may be tackled in two steps: first minimize with respect to P taking into account 
the mean log-likelihood constraint, and obtain (7), and second, minimize with respect to 
Pi. Taking into account (8), problem (9) becomes 



K = sup 



a8- 



maxpj (a-l)D a (Pi\\Q) 
subject to: m = E Pl [X] or m = E P * [X] 



(10) 



and amounts to the extremization of Renyi information divergence under a mean 
constraint. Therefore, we find that the amended MaxEnt formulation leads to the 
maximization of Renyi (or equivalently Tsallis) entropy subject to a statistical mean 
constraint. We can note that the second constraint, m = E P * [X] is nothing else but the 
'generalized expectation' of nonextensive statistics that has here a clear interpretation. 

It is important to note that the minimization of Kullback-Leibler divergence with 
respect to P and Pi, subject to the two constraints, may not always reduce to the two- 
steps procedure above. 



SOLUTIONS TO THE MAXIMIZATION OF RENYI Q -ENTROPY 

We now consider the maximization of Renyi Q-entropy subject to the classical mean 
constraint (C) m = E Pl [X] and the generalized mean constraint (G) m = E P * [X] as we 
obtained in (10). We first begin by some results on a general 'Tsallis' distribution, that 
simplify the derivation of exact solutions (proofs are omitted to save space). 



Preliminary results 
Definition 1 Distribution Pf(x) is defined by: 

Pf{x) = Yi(x-x) + l} u Q{x)e D ^ pf W Q \ (11) 

on domain V = VqHV^, where Vq = {x : Q(x) > 0} andV^ = {x : j(x — x) + 1 > 0} . 
In this expression, x is either (a) a fixed parameter, say m, and Pf{x) is a two 
parameters distribution, (b) or some statistical mean with respect to Pf(x), e.g. 
its "classical" or "generalized" mean, and as such a function of 7. Observe that 
distribution Pf{x) is not necessarily normalized to one. Associated with Pf(x), we 
also define a partition function 

Z u (j,x) = / [j(x-x) + l] l/ Q(x)dx. (12) 
Jv 

Notation 2 We will denote by E u [X] the statistical mean with respect to the probability 
distribution associated with Pf(x), and by Eu [X] the generalized a— mean. One can 
observe that in the case of the Levy distribution (11), we have Ei a) [X] = E au [X] . In the 
special case v = ±£, we obtain E^ [X] = E±^ + ^ [X] , because £a = (£ + 1) = 

Theorem 3 The Levy distribution P*(x) with exponent v — £, is normalized 
to one if and only if x = E^ [x] , the statistical mean of the distribution, and 
D a (P*\\Q) = -logZ €+1 ( 7 ,x) = -log^( 7 ,x). 

In the same way, the Levy distribution P^(x) with exponent v = is normalized 
to one if and only if x — E_^_i [x] = E^ [x] , the generalized a— expectation 
of the distribution, and D a (P^\\Q) = — logZ_(g +1 )(7,x) = — logZ_g(7,x), with 

«£=(£+!)■ 

When x is a fixed parameter m, this will be only true for a special value 7* 0/7 such 
that E^ [x] —m or E^ [x] = m, respectively in the first and second case. 

Remark 4 Here takes place an important remark on the mapping x <-» 7. Consider the 
normalized distribution P^(x) with x = E% [x] . This distribution depends on the sole 
parameter 7, and x is a function of 7. But contrary to the intuition, the mapping x «-> 7 
is not necessarily one to one. This means that a specified value of the mean x = m may 
correspond to several values of '7, and conversely a specified value ofj may give several 
different means x. This can be illustrated through numerical examples. 

Lemma 5 Partition functions ^+1(7, m) and Z_^(7,m) are convex functions ofj. 

Solutions 

The solutions to the maximization of Renyi Q-entropy subject to the classical mean 
constraint (C) m = E Pl [X] and the generalized mean constraint (G) m — E P * [X] are 



found using standard Lagrangian techniques The optimum solution, see for instance 
[11], is a saddle point of the Lagrangian and we may proceed in two steps: first minimize 
the Lagrangian in P(x), and thus obtain a solution in terms of the Lagrange parameters, 
and then maximize the resulting Lagrangian, the dual function, in order to exhibit 
the optimum Lagrange parameters. Taking into account the normalization conditions 
described above, these solutions are easily derived and simplified: 

(C) P c (x) = [7(g " g) .J 1] * Q(s), withx = E Pc [X] =E ( [X] (13) 

X) 

(G) P G {x) = ^^0 i Q(x)withx = E PG [X]=E_ ii+1) [X] (14) 

where £ = — , and Z v {^,x) is the partition function. It is important to emphasize that x 
in (13) is the statistical mean with respect to Pc(x), x in (14) is the generalized ct-mean 
with respect to Pg(x), and as such a function of 7. It is a common mistake in the large 
majority of reported results and calculations to improperly take for x the fixed value m 
of the constaint, which is only correct for the optimum value of the Lagrange parameter. 

These optimum distributions appear to be self-referential, since their expressions 
involve their statistical mean. Therefore, the direct determination of their parameters 
is difficult, if not intractable. 



Alternate dual functions 

From the Lagrangian theory, one should maximize the dual function in order 
to obtain the remaining Lagrange parameter. But in the present cases, the dual 
functions are implicitely defined. Thus, in order to identify the value of the natural 
parameter associated to the mean constraints, I propose two 'alternate' (but effectively 
computable) dual functions, whose numerical maximizations enable to exhibit the 
optimum parameters. 

For the classical mean, I just sketch the procedure. At the optimum, we have D(j*) = 
sup 7 sup At infpL(P,7,/i). For any value /x of ji, letting D("f) = L(P*~,7,/I), we have 
D(Y) > 5(7). Thus, if D(Y) = D(Y) for the optimum 7*, then 5(7*) will be a 
maximum of ^(7) and the maximization of the dual function can be carried equivalently 

via the maximization of D{j). Condition D{^*) = I? (7*) is achieved with ^(7) = 
— (£ + 1) (1 — 7m) . Then, after some algebra, we obtain the very simple form 

D c {l) = -logZ e+1 (7,m) (15) 

that is simply the expression of the divergence from P* to Q, D a (P*\\Q).We know that 
(7,771) is a convex function. Thus, if Z^+i (7,771) is defined on a continuous domain, 
Dc{l) nas an only maximum for 7 = 7*. If Z^ + i (7,771) is defined (and convex) on 
several intervals, -De (7) ma Y have a maximum on each of these intervals, and one has to 



select the minimum of these maxima (that is the maximum associated with the minimum 
divergence). Hence, the identification of the optimum parameter 7* simply amounts to 
the unconstrained maximization of an unimodal functional, possibly in several intervals. 

For the generalized mean, the rationale for an alternate dual function is as follows. 
We know that D a (P_A\Q) = — log (7,777) when the generalized mean constraint 

is satisfied. Since dlogZ ~^ 7 '"^ = — £ (x — m) , — log Z_^ (7,777) is maximum 

when the constraint x = m is satisfied. Hence, the search of the optimum Lagrange 
parameter can be carried using the very simple alternate dual function 

D G {l) = -logZ_^(7,m). (16) 

The partition function (7,777) is a convex function for a < 1. If it is defined on a 
continuous domain, D G {^f) has an only maximum that is simply reached for 7* such that 
?77 = E-£-i[x], the generalized a-mean. If the domain is given by several intervals, then 
Dg{i) ma Y present several maxima, and the minimum of these maxima, associated with 
the minimum divergence D a (P^\\Q), has to be selected. We thus obtain two practical 
numerical schemes for the identification of the distributions parameters, and it is also 
possible to study the behaviour of entropies associated with some particular references 
Q. We come to a close to this presentation by considering the relationship between the 
two minimization problems and an underlying Legendre structure. 



DUALITY AND LEGENDRE STRUCTURE 

The a <-> 1 jot duality 

The dual functions associated to the two problems are — log Z^ 1+ i (7,777) and 
— log Z_£ 2 (7,777). Thus, we will have pointwise equality of dual functions, and of 
course of the optima, if £1 + 1 = — £ 2 , that is if indexes a\ and a 2 satisfy a\ = l/a 2 . We 
can also remark that with — £ 2 = £i + l = tti£i,we have the following relations between 
the two optimum probability density functions: 

p c = and P c = G J_ a2 , with a 2 = l/a h (17) 

and using the fact that Z^ 1+ i (7, m) = Z^ (7, m) for the optimum value of 7. It means that 
Pa is the escort distribution of Pc with index ol\ and that P c is the escort distribution 
associated with Pq and index ct 2 . It can be checked in the general case that always 
have the equality Di(P*\\Q) = D a (Pi\\Q) between the 1/a Renyi divergence of the 

a 

escort distribution to Q and the standard a divergence Hence, the minimization of the 
a Renyi divergence subject to the generalized mean constraint is exactly equivalent to 
the minimization of the 1/a Renyi divergence subject to the classical mean constraint 
so that generalized and classical mean constraints can always be swapped, provided the 
index a is changed into 1/a, as was argued in [12, 13]. 



The Legendre structure 



In the study of alternative entropies, considerable efforts have been directed to the 
analysis of associated thermodynamics. The concave entropies corresponding to our two 
problems are Sc = logZ 5+ i(— t|^t,x), and Sq = hgZ^(X/^,x). Let us consider the 
general form S = log (7,2;). 

In terms of the Lagrange multiplier A, it can be shown that 

dS _dSdj _ ^/j^^dx 
dX d*y dX dX 

Specializing the result to the two entropies, we obtain in both cases the Euler formula: 

Next, the derivative of the entropy with respect to the mean is simply 

dS dS dX ^dx dX ^ 
dx dX dx dXdx 

Let us now introduce the Massieu potential 0(A) = S — Xx (or equivalently the free 
energy). Derivations with respect to the Lagrange parameter and to the mean give 

d(f> A d(j) _dX 

— = -x, and — = -x—. (21) 

dX dx dx 

These four relations show that S and <p are conjugated with variables x and A : S [x] ^ 
[A] , so that the basic Legendre structure of thermodynamics is preserved (but care must 
be taken for interpretations, for instance a valid definition of temperature requires that A 
always remains positive). 
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